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Minimum 
signal 
peptide score 

"5!s 

4 
4.S 
6 
S.S 
6 
6,6 
7 
7.5 
8 
8,6 
9 
9,5 
10 



falsa positive 
rate 

0/121 

G.096 
0,078 
0,062 
O.OS 
0,04 
0,033 
0,025 
0,021 
0,015 
0,012 
0,009 

0,007 
0,006 



false 
negative rate 

0 V 036 
0,06 
0,079 
0,098 
0,127 
0,163 
0,202 
0,248 
0,304 
0,368 
0,418 
0,612 
0,681 
0,679 



proba(O.I) 

Q.467 
0,619 
0,565 
0.615 
0,659 
0,694 
0.72S 
0,763 
0,78 
0,816 
0,836 
0,856 
0,863 
0,835 



proba(0.2) 

0,664 
0.708 
0,745 
0,782 
0,813 
0,836 
0.855 
0.878 
0,889 
0,909 
0,92 
0.93 
0,934 
0,919 
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Score curvoa 



Influence of minimum ocore on signal popUde recognition 
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Minimum 
signal 

peptide 
score 


All ESTs 


New ESTs 


ESTs 
matching 
public EST 
closer than 
40 bp (ram 
beginning 


ESTs 

^^^^ » 

Qxtondlna 
known 
mRNA mare 
than 40 bo 


ESTs 
extendina 
oubtia EST 
mars than 

40 bo 


3,5 


2674 


947 


599 


23 


150 


A 
*t 




7H4 


48H 


23 


IZo 


4,5 


1943 


647 


425 


22 


112 


S 


1657 


523 


353 


21 


96 


5.5 


1417 


419 


307 


19 


80 


6 


1190 


340 


238 


18 


68 


6,5 


1035 


280 


166 


18 


60 


7 


893 


219 
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15 


46 


7.5 
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173 


132 


12 


36 


8 
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133 


101 


11 


29 


8,5 
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83 


8 


26 


g 
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81 


63 


6 


24 


9,5 


364 


57 


48 


6 


18 


10 


303 


47 


35 
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16 
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Tissue 






ESTs 
mfltehina 

Dublla EST 


ESTs 
extending 


ESTs 
extending 


All £STb New 1 


ESTs 


closer than 


known 


public EST 








40 bo from 
bflfllnnfnn 

wiJ^II II III IVJ 


mRNA more 
than 40 bp 


more than 
40 bp 


Brain 


329 


131 


75 

« w 


3 


24 


Cancerous prostata 


134 


40 


37 


1 


£ 


Cerebellum 


17 


g 


1 


0 


6 


Colon 


21 


11 


4 


0 


0 


Dystrophic muscle 


41 


18 


8 


0 


■ 


Fetal brain 


70 


37 


16 


o 


i 
i 


Fetal kidney 


227 


116 


46 


1 


19 


Fetal Uver 


13 


7 


2 


0 


0 


Heart. 


30 


15 


7 


o 


1 
■ 


Hypertrophic prostate 


86 


23 


22 


2 


2 

■a 


Kidney 


10 


7 


3 


o 


o 


Large Intestine 


21 


8 


4 


o 


1 


Liver 


23 


9 


6 


o 


o 


Luna 


24 


12 


4 


o 


1 

1 


Lung (cods) 


57 


38 


e 


o 




Lymph ganglia 


163 


60 


23 


2 


12 


Lymphocytes 


23 


6 


4 


o 


2 


Muscle 


33 


16 


6 


o 


4 


Normal prostate 


101 


61 


45 


7 


11 
■ i 


Ovarv 


90 


57 

w f 


12 


i 

• 




Pancreas 


48 


11 


6 


0 


1 


Placenta 


24 


5 


1 


0 


0 


Prostate 


34 


16 


4 


0 


2 


Spleen 


56 


28 


10 


0 


1 


Substantia nigra 


108 


47 


27 


1 


6 


Surranals 


1S 


3 


3 


1 


0 


Testis 


131 


68 


25 


1 


8 


Thyroid 


17 


8 


2 


0 


2 


Umbilical com* 


65 


17 


12 


1 


3 


Uterus 


28 


15 


3 


0 


2 


Non tissue-specific 


566 


48 


177 


2 


28 


Total 


2677 


947 


601 


23 


150 
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BamHI 2772 Clal 2666 



Plasmld name: pED6dpc2 
Plasmld size: 5374 bp 



Comments/References: pED6dpc2 is derived from pED6dpc1 by insertion of a new 
polytinker to facilitate cONA cloning. SST cDNAs are cloned between EcoRI and Notl. 
pED vectors are described in Kaufman et al.(1991). NAR 19: 4485-4490. 
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0*«Wc* * Tran*ttp«cn Factor Boding St« pou« ft t on pcttnoto n bo( Ato <S fnxn 
Promoter wqt*no*Pi3H2 (548 bp): 



Matrix 

CMYBjOt 

MYOO.06 

sa.m 

88.01 

0ELTAEF1.01 

GATVLC 

CMYB.01 

GATA1.02 

GATA.C 

TAL1ALPHAE47.01 

TAL1BETAE47.01 

TAL1BETAfTF2jM 

MYOO.Q6 

GATA1.04 

1K1JH 

IK2J31 

CREL.01 

GATA1.02 

8AY.02 

E2F.Q2 

MZF1.01 



POdWOO 
-602 
401 



Orianfafion 



-425 
•300 

-349 
-343 

43d 

-235 
•236 
-235 
•232 
♦217 
-12* 
•126 
-123 
46 
-41 
•33 
4 





own 


i — ■ -* 
Lencfln 


4 


0.903 


V 




0.061 


1 u 


m 


0 060 


1 1 


4 


0.066 


1 I 




0460 


11 


* 


0464 


11 


4 


0458 


9 




0.060 


14 




0453 


11 


+ 


0*973 


16 


4 


0.883 


16 


4 


0478 


16 


• 


0454 


10 


• 


0453 


13 


♦ 


0463 


18 




0485 


12 


4 


0462 


10 


♦ 


0450 


14 


• 


0461 


12 




0457 


8 


m 


0475 


8 



TGTCAGTTG 

CCCAACTQAC 

AATAGAATTAG 

AACTAAATTAG 

GCACACCTCAG 

AGATAAATCCA 

CTTCAGTTG 

TTGTAGATAGGACA 

AGATAGGACAT 

CATAACAGATG gtaaq 

CATAACAQATGGTAAG 

CATAAGAGATGGTAAG 

ACCATCTGTT 

T6AAGATAAAQTA 

AGTTGGGAATTCC 

AGTTGGGAATTC 

TGGGAATTCC 

TCAGTGATATGG CA 

TAAAACAAAACA 

TTTAGCGC 

TGAGGGGA 



PromcUf Mqu«ac« P16B4 ($61 bp) : 



Matrix 

NFY.Q6 

M2F1J31 

CMYB.01 

VMYB.Q2 

8TAT.01 

6TA$.01 

M2F1JJ1 

IK^.01 

MZF1J01 

SRYjtt 

MZF1.01 

MY00.Q6 

0ELTAEF1.01 

66J01 

MZFIJJ1 



Position 
-748 
•738 
484 



473 
473 
456 
-451 
•424 
408 
-216 
-190 
-176 
6 

16 



Offoctbfion 



4 
4 

4 

4- 

4 
4 
+ 



Soort 
0.956 
0462 
a994 
0.085 
0.066 
0451 
0.956 
0.065 
0488 
0455 
0460 
0481 
0.955 

aoo2 

0.986 



11 GGACCAATCAT 

8 CCTGGGGA 

9 TQACCGTTG 
0 TCCAACGGT 
0 TTCCTOGAA 
9 TTCCAGGAA 
8 TTGGGGOA 

12 GAATGGGATTTC 
6 AGAGGGGA 

12 GAAAACAAAAGA 

8 GAAGGGGA 

10 AGCATCTGCC 

11 TCCCACCTTCC 
11 GAGGCAATTAT 

8 AGAGGGGA 



Promote Mqutnc«.P29B6 (555 bp) : 



hUtrfx 
ARNT.01 

NMYCJJ1 

USFjOl 

U6F.01 

NMYO.01 

MYCMAX.02 

USF.C 

U8F.C 

MZF1.01 

ELK1.02 

CET81P64 01 
AP1.Q4 

AP1FJ.Q2 
PA0S_C 



PotWon 


Orientation 


8oon 


L«ngti 


411 


4 


0464 


16 


409 


4 


0466 


12 


409 


4 


0485 


12 


409 




0485 


12 


409 




0456 


12 


409 


• 


0472 


12 


407 


4 


0497 


e 


407 


m 


0,991 


B 


-292 




0466 


6 


•105 


4 


0.963 


14 


•102 


4 


0474 


10 


•42 




0463 


11 


■42 


* 


0461 


11 


45 


4 


1.000 


0 



GGACTCACGTGCTGCT 

ACTCACGTOCTG 

ACTCAOGTGCTG 

OAGCA0GTGAGT 

OAGGAOGTGAQT . 

CAQCAOGTGAGT 
TCAOQTGC 

GCACGTGA 

CATGGGGA 

CTCTOCGGAAGCCT 

TCOGGAAGCC 

AGTGACTOAAC 

AGTOACTQAAC 

TGTGGTCTC 
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98.2% identity in 113 aa overlap 

10 20 30 40 50 60 

SeqID214 MVIRWIASSSGSTAIKKKQQDVLGFLE^NKIGFEEKDIAANEENRKWMRENVPENSRPA 
: : : : : : : : : : : — i ::::::::::::::::::::::::::::::::::::::: : 
AF042081 MVIRVYIASSSGSTAIKKKQQDVLGFLEANKIGFEEKDIAANEE^KWMRENVPENSRPA 

10 20 30 40 50 60 

70 80 90 100 110 

SeqID214 TGNPLPPQIFNESQYRGDYDAFFEARENNAVYAFLGLTAPSGSKEAEVQAKQQ 



AF042081 TGYP^^QIFNESQYRQDYDAFFEARENNAVYAFLGLTAPPGSKEAEVQAKQQ 

70 80 90 100 110 



FIGURE 9 



10/16 



seqID215 
seqID185 
AF082526 



seqID215 
seqID185 
AF082526 



seqID215 
seqID185 
AF082526 



MADDLKR F L YKKL P S VEGLHA I WS DRDG VP V I KVANDNAP EHALRPG FLS T F ALAT DQG 
MADDLKRFLYKKLPSVEGLHAIWSDRDGVPWKVANDNAPEHALRPGFLSTFALATDQG 

MADDLKRFLYKKLPSVEGLHAIWSDRDGVPVIKVANDSAPEHALRPGFLSTFALATDOG 
******* ****************************** ********************* 

SKLGLSKNKSIICYYNTYQWQFNRLPLWSFIASSSANTGLIVSLEKELAPLFEELRQV 

SKLGLSKNKSIICYYNTYQWQFNRLPLWSFIASSSANTGLIVSLEKELAPLFEELRQV 

SKLGLSKNKSIICYYNTYQVVQFNRLPLWSFIASSSANTGLIVSLEKELAPLFEELIKV 
***************** ****************** ### ^ ***************** # 

• 

VEVS 
VEVS 
VEVS 



* * * * 



FIGURE 10 
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91.3% identity in 230 aa overlap 

10 20 30 40 50 60 

SeqIDI 8 6 MASLGLQLVGYILGLLGLLGTLVAMLLPSWKTSSYVGASIVTAVGFSKGLWMECATHSTG 

: : : : : •:::::::::::::: : 
AF072 12 8 MASLGVQLVGYILGLLGLLGTSIAMLLPNWRTSSYVGASIVTAVGFSKGLWMECATHSTG 

10 20 30 40 50 60 

70 80 90 100 110 120 

SeqID186 ITQCDIYSTLLGLPADIQAAQAMMVTSSAISSLACIISWGMRCTVFCQESRAKDRVAVA 

AF072128 ITQCDIYSTLIX3LPADIQAAQAMMVTSSAMSSIACIISWGMRCTVFCQDSRAKDRVAW 

70 80 90 100 110 120 

130 140 150 160 170 180 

SeqIDI 8 6 GGVFFILGGLLGFIPVAWNLHGILRDFYSPLVPDSMKFEIGEALYLGIISSLFSLIAGII 

********* ******** + 

********* •■*•***•* Z "****** • • * • * • * 

AF072128 GG VF F I LGG I LGF I P VAWNLHG I LRDF YS PL VPDSMKF E I GEAL YLG I I S AL F S LVAG V I 

130 140 150 160 170 180 

190 200 210 220 230 

SeqIDI 8 6 LCFSCSSQRNRSNYYDAYQAQPLATRSSPRPGQPPKVKSEFNSYSLTGYV 

****** * * * • • • ■ ■••••**«*•••■ m mm ..^...^^.^ 

•*•••*•• • >•* + • p " ■ »•*••***«.#•* 

'•■■■* , * l ***»*'*****»»»*»*«#»#** -- »* 

AF072128 LCFSCSPQGNRTNYYDGYQAQPLATRSSPRSAQQPKAKSEFNSYSLTGYV 

190 200 210 220 230 



FIGURE 11 
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98.3% identity in 121 aa overlap 

10 20 30 

seqID213 RFRKETDNAAIIMKVDKDRQMWLEEEFRNISPEELKME 



AB001993 MSDSLWCEVDPELTEKLRKFRFRKETDNAAIIMKVDKDRQMVVLEEEFQNISPEELKME 

10 20 30 40 50 60 

40 50 60 70 80 90 

seqID2 13 LPERQPRFWYSYKYVRDDGRVSYPLCFIFSSPVGCKPEQQMMYAGSKNRLVQTAELTKV 

AB0019 93 LPERQPRFWYSYKYVHDDGRVSYPLCFIFSSPVGCKPEQQMMYAGSKNRLVQTAELTKV 

70 80 90 100 110 120 

100 110 120 

seqID213 FEIRTTDDLTEAWLQEKLSFFR 



AB001993 FEIRTTDDLTEAWLQEKLSFFR 

130 140 



FIGURE 12 
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95.6% identity in 91 aa overlap 
seq ID191 MGCVFQSTEDKCIFKIDWTLS 



W36955 MFCPLKLILLPVLLDYSLGLNDLNVSPPELTVHTC 

10 20 30 40 50 60 

30 40 50 60 70 80 

seq ID191 PGEHAKDEYVLYYYSNLSVPIGRFQNRVHLMGDILCNDGSLLLQDVQEADQGTYICEIRL 



• • * » 



W3 6 9 5 5 PGEHAKDEYVLYYYSNLSVPIGRFQNRVHLMGDNLCNEK3SLLLQDVQD^^ 

7 0 80 90 100 110 

90 100 
seq ID191 KG ES Q VF KKAWLHVL P E E P KGTQMLT 



FIGURE 13 




ft 



ft 
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99.0% identity in 381 aa overlap; 



seqID200 



id200 



10 20 30 40 50 60 

MLLSIGMLMLSATQVYTVLTVQLFAFLNPLPVEADILAYNFENASQTFDDLPARFGYRLP 

: : : : : : : : : : : : : ::::::::::::: : 

AF037204 MLLSIGMLMLSATQVYTILTVQLFAFLNLLPVF^ 

10 20 30 40 50 60 

70 80 90 100 110 120 

AEGLKGFLINSKPF^ACEPIVPPPVKDNSSGTFIVLIRRLDCNFDIKVLNAQRAGYKAAI 

; : : : : : : : : : : : : : : J ::::::::::::::::::::::::::::::: : 

AF037204 AEGLKGFLINSKPENACEPIVPPPVKDNSSGTFIVLIRRLDCNFDIKVI^AQRAGYKAAI 

70 80 90 100 110 120 

130 140 150 160 170 180 

VHNVDSDDLISMGSNDIEVLKKIDIPSVFIGESSASSLKDEFTYEKGGHLILVPEFSLPL 

AF037204 VHNVDSDDLISMGSNDIEVLKKIDIPSVFIGESSANSLKDEFTYEKGGHLILVPEFSLPL 

130 140 150 160 170 180 

.^ nrtA 190 200 210 220 230 240 

id200 EYYLIPFLIIVGICLILIVIFMITKFVQDRHRARRNRLRKDQLKKLPVHKFKKGDEYDVF 

EYYLIPFLIIVGICLILIVIFMITKFVQDRHRARRNRLPJCDQLKKLPVHKFKKGDEYD^fc 
190 200 210 220 230 240 



id200 



AF037204 



250 260 270 280 290 300 

id200 AICLDEYEIXJDKLRILPCSHAYHCKCVDPWLTKTKKTCPVcbQKVVPSQGDSDSDTDSSQ 

AF037204 AICLDEYEIXSDKLRILPCSHAYHCKCVDrWL^ 



250 260 270 280 290 



300 



3 *° 320 3 30 340 350 360 

ENEINE 



id200 EENEVTEHTPLLRPLASVSAQSFGALSESRSHQNMTESSDYEEDDNEDTDSSDA 



AF037204 EENEWEHTPLLRPI^SVSAQSFGALSESRSHQl^ESSDYEEDDNEDTDSSDAENEINE 

310 320 330 340 350 360 

370 380 
id200 H D VWQ LQ PNGER D YN I ANT V 



• feat 



* ■ # * 



AF037204 HDVWQLQPNGERDYNIANTV 

370 380 



FIGURE 14 
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100.0% identity in 68 aa overlap 

10 20 30 40 50 60 

seqID192 MSVTWGFVGFLVPWFIPKGPNRGVIITMLVTCSVCCYLFWLIAILAQLNPLFGPQLKNET 



Yl 5 2 8 6 MSVFWGFVGFLVPWFI PKGPNRGVI ITMLVTCSVCCYLFWLI AILAQLNPLFGPQLKNET 

20 30 40 50 60 70 



seqID192 IWYLKYHW 



Y15286 IWYLKYHW 

80 



FIGURE 15 
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seqID201 
seqID227 
X85116 



-MDSRVS - - S P EKQDKENFVGVNNKRLG VCGWI LF S L S FLL V I ITFPIS I WMCLKI IREY 

MWLDP VFPLFPVG DH 

MAEKRHTRDSEAQRLPDSFKDSPSKGLGPCGWILVAFSFLFTVITFPISIWMCIKIIKEY 



* * 



seqID201 
seqID227 
X85116 



ERAVWRLGRIQADKAKGPGLILVLPCIDVFVKVDLRTVTCNIPPQEILTRDSVTTQVDG 
y LPHLHMDVLEG ~ - L I LVLPC I D VFVKVDLRT VT CN I PPQEI LTRDS VTTQVDG 

ERAIIFRLGRILQGGAKGPGLFFILPCTDSFIKVDMRTISFDIPPQEILTKDSVTISVDG 
* .. * * *** * * *** ** ******** **** *** 



seqID201 
seqID227 
X85116 



VVYYRIYSAVSAVANVNDVHQATFLLAQTTLRNVLGTQTLSQILAGFEEIAHSIQTLLDD 

VVYYRIYSAVSAVANWDVHQATFLIAQTTLRNVLGTQTLSQILAGREEIAHSIQTLLDD 

VVYYRVQNATl^VANITNADSATRLLAQTTLRNVLGTKNLSQILS ^EEIAHNMQSTLDD 

** ************* ***** ****** * *** 



***** * **** 



i i 



seqID201 
seqID227 
X85116 



seqID201 
seqID227 
X85116 



ATELWGIRVARVEITOWIPVQLQRSMAAEAEATREAPJUCVLAAEGEMSASKSLKSASMV 
ATELWGIRVARVEIltvRIPVQLQRSMAAEAEATREARAKVLAAEGEMNASKSLKSASMV 

ATDAWGIK\^RVEIKp VKLPVQLQRAMAAEAEASREARAKVIAAEGF^ASRAT,KFA.gTVfV 
** 4 **** *******_************* ******* ****** ** # * **** 

LAESPIALQLRYLQTLSTVATEKNSTIVFPLPMNILEGIGGVSYDNHKKLPNKA 

LAESPIALQLRYLQTLSTVATEKNSTIVFPLPMNILEGIGGVSYDNHKKLPNKA 

ITESPAALQLRYLQTLTTIAAEKNSTIVFPLPIDMLQGIIGAKHSHLG 

*** ********** * * *********** * ** * 



i I z 



FIGURE 16 



